equation discovery
Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty
Nonlinear dynamics are ubiquitous in science and engineering applications, but the physics of most complex systems is far from being fully understood. Discovering interpretable governing equations from measurement data can help us understand and predict the behavior of complex dynamic systems. Although extensive work has recently been done in this field, robustly distilling explicit model forms from very sparse data with considerable noise remains intractable. Moreover, quantifying and propagating the uncertainty of the identified system from noisy data is challenging, and relevant literature is still limited. To bridge this gap, we develop a novel Bayesian spline learning framework to identify parsimonious governing equations of nonlinear (spatio)temporal dynamics from sparse, noisy data with quantified uncertainty. The proposed method utilizes spline basis to handle the data scarcity and measurement noise, upon which a group of derivatives can be accurately computed to form a library of candidate model terms. The equation residuals are used to inform the spline learning in a Bayesian manner, where approximate Bayesian uncertainty calibration techniques are employed to approximate posterior distributions of the trainable parameters. To promote the sparsity, an iterative sequential-threshold Bayesian learning approach is developed, using the alternative direction optimization strategy to systematically approximate L0 sparsity constraints. The proposed algorithm is evaluated on multiple nonlinear dynamical systems governed by canonical ordinary and partial differential equations, and the merit/superiority of the proposed method is demonstrated by comparison with state-of-the-art methods.
SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces?
Kabra, Sanchit, Kriplani, Shobhnik, Shojaee, Parshin, Reddy, Chandan K.
Equation discovery from data is a core challenge in machine learning for science, requiring the recovery of concise symbolic expressions that govern complex physical and geometric phenomena. Recent approaches with large language models (LLMs) show promise in symbolic regression, but their success often hinges on memorized formulas or overly simplified functional forms. Existing benchmarks exacerbate this limitation: they focus on scalar functions, ignore domain grounding, and rely on brittle string-matching based metrics that fail to capture scientific equivalence. We introduce SurfaceBench, first comprehensive benchmark for symbolic surface discovery. SurfaceBench comprises 183 tasks across 15 categories of symbolic complexity, spanning explicit, implicit, and parametric equation representation forms. Each task includes ground-truth equations, variable semantics, and synthetically sampled three dimensional data. Unlike prior SR datasets, our tasks reflect surface-level structure, resist LLM memorization through novel symbolic compositions, and are grounded in scientific domains such as fluid dynamics, robotics, electromagnetics, and geometry. To evaluate equation discovery quality, we pair symbolic checks with geometry-aware metrics such as Chamfer and Hausdorff distances, capturing both algebraic fidelity and spatial reconstruction accuracy. Our experiments reveal that state-of-the-art frameworks, while occasionally successful on specific families, struggle to generalize across representation types and surface complexities. SurfaceBench thus establishes a challenging and diagnostic testbed that bridges symbolic reasoning with geometric reconstruction, enabling principled benchmarking of progress in compositional generalization, data-driven scientific induction, and geometry-aware reasoning with LLMs. We release the code here: https://github.com/Sanchit-404/surfacebench
- North America > United States > New York > New York County > New York City (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- (2 more...)
Adaptable Hindsight Experience Replay for Search-Based Learning
Vazaios, Alexandros, Brugger, Jannis, Derstroff, Cedric, Kersting, Kristian, Mezini, Mira
AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.06)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
Approximating the universal thermal climate index using sparse regression with orthogonal polynomials
Roman, Sabin, Skok, Gregor, Todorovski, Ljupco, Dzeroski, Saso
This article explores novel data-driven modeling approaches for analyzing and approximating the Universal Thermal Climate Index (UTCI), a physiologically-based metric integrating multiple atmospheric variables to assess thermal comfort. Given the nonlinear, multivariate structure of UTCI, we investigate symbolic and sparse regression techniques as tools for interpretable and efficient function approximation. In particular, we highlight the benefits of using orthogonal polynomial bases-such as Legendre polynomials-in sparse regression frameworks, demonstrating their advantages in stability, convergence, and hierarchical interpretability compared to standard polynomial expansions. We demonstrate that our models achieve significantly lower root-mean squared losses than the widely used sixth-degree polynomial benchmark-while using the same or fewer parameters. By leveraging Legendre polynomial bases, we construct models that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training on just 20% of the data, our models generalize robustly to the remaining 80%, with consistent performance under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense. We also connect these findings to the broader context of equation discovery in environmental modeling, referencing probabilistic grammar-based methods that enforce domain consistency and compactness in symbolic expressions. Taken together, these results illustrate how combining sparsity, orthogonality, and symbolic structure enables robust, interpretable modeling of complex environmental indices like UTCI - and significantly outperforms the state-of-the-art approximation in both accuracy and efficiency.
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Europe > Portugal (0.04)
- Europe > Poland (0.04)
- (2 more...)
EDC: Equation Discovery for Classification
Equation Discovery techniques have shown considerable success in regression tasks, where they are used to discover concise and interpretable models (\textit{Symbolic Regression}). In this paper, we propose a new ED-based binary classification framework. Our proposed method EDC finds analytical functions of manageable size that specify the location and shape of the decision boundary. In extensive experiments on artificial and real-life data, we demonstrate how EDC is able to discover both the structure of the target equation as well as the value of its parameters, outperforming the current state-of-the-art ED-based classification methods in binary classification and achieving performance comparable to the state of the art in binary classification. We suggest a grammar of modest complexity that appears to work well on the tested datasets but argue that the exact grammar -- and thus the complexity of the models -- is configurable, and especially domain-specific expressions can be included in the pattern language, where that is required. The presented grammar consists of a series of summands (additive terms) that include linear, quadratic and exponential terms, as well as products of two features (producing hyperbolic curves ideal for capturing XOR-like dependencies). The experiments demonstrate that this grammar allows fairly flexible decision boundaries while not so rich to cause overfitting.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Netherlands > South Holland > Leiden (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics
Liu, Wei, Bacsa, Kiran, Tang, Loon Ching, Chatzi, Eleni
Understanding and modeling nonlinear dynamical systems is a fundamental challenge across science and engineering. Deep learning has shown remarkable potential for capturing complex system behavior, yet achieving models that are both accurate and physically interpretable remains difficult. To address this, we propose Structured Kolmogorov-Arnold Neural ODEs (SKANODEs), a framework that integrates structured state-space modeling with Kolmogorov-Arnold Networks (KANs). Within a Neural ODE architecture, SKANODE employs a fully trainable KAN as a universal function approximator to perform virtual sensing, recovering latent states that correspond to interpretable physical quantities such as displacements and velocities. Leveraging KAN's symbolic regression capability, SKANODE then extracts compact, interpretable expressions for the system's governing dynamics. Extensive experiments on simulated and real-world systems demonstrate that SKANODE achieves superior predictive accuracy, discovers physics-consistent dynamics, and reveals complex nonlinear behavior. Notably, it identifies hysteretic behavior in an F-16 aircraft and recovers a concise symbolic equation describing this phenomenon. SKANODE thus enables interpretable, data-driven discovery of physically grounded models for complex nonlinear dynamical systems.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- North America > United States (0.46)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science (0.68)
Multi-Task Equation Discovery
Bee, S C, Dervilis, N, Worden, K, Bull, L A
Equation discovery provides a grey-box approach to system identification by uncovering governing dynamics directly from observed data. However, a persistent challenge lies in ensuring that identified models generalise across operating conditions rather than over-fitting to specific datasets. This work investigates this issue by applying a Bayesian relevance vector machine (RVM) within a multi-task learning (MTL) framework for simultaneous parameter identification across multiple datasets. In this formulation, responses from the same structure under different excitation levels are treated as related tasks that share model parameters but retain task-specific noise characteristics. A simulated single degree-of-freedom oscillator with linear and cubic stiffness provided the case study, with datasets generated under three excitation regimes. Standard single-task RVM models were able to reproduce system responses but often failed to recover the true governing terms when excitations insufficiently stimulated non-linear dynamics. By contrast, the MTL-RVM combined information across tasks, improving parameter recovery for weakly and moderately excited datasets, while maintaining strong performance under high excitation. These findings demonstrate that multi-task Bayesian inference can mitigate over-fitting and promote generalisation in equation discovery. The approach is particularly relevant to structural health monitoring, where varying load conditions reveal complementary aspects of system physics.
Data-Efficient Symbolic Regression via Foundation Model Distillation
Ying, Wangyang, Zhang, Jinghan, Bai, Haoyue, Gong, Nanxu, Wang, Xinyuan, Liu, Kunpeng, Reddy, Chandan K., Fu, Yanjie
Discovering interpretable mathematical equations from observed data (a.k.a. equation discovery or symbolic regression) is a cornerstone of scientific discovery, enabling transparent modeling of physical, biological, and economic systems. While foundation models pre-trained on large-scale equation datasets offer a promising starting point, they often suffer from negative transfer and poor generalization when applied to small, domain-specific datasets. In this paper, we introduce EQUATE (Equation Generation via QUality-Aligned Transfer Embeddings), a data-efficient fine-tuning framework that adapts foundation models for symbolic equation discovery in low-data regimes via distillation. EQUATE combines symbolic-numeric alignment with evaluator-guided embedding optimization, enabling a principled embedding-search-generation paradigm. Our approach reformulates discrete equation search as a continuous optimization task in a shared embedding space, guided by data-equation fitness and simplicity. Experiments across three standard public benchmarks (Feynman, Strogatz, and black-box datasets) demonstrate that EQUATE consistently outperforms state-of-the-art baselines in both accuracy and robustness, while preserving low complexity and fast inference. These results highlight EQUATE as a practical and generalizable solution for data-efficient symbolic regression in foundation model distillation settings.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > South Carolina (0.04)
- (3 more...)
Bayesian symbolic regression: Automated equation discovery from a physicists' perspective
Guimera, Roger, Sales-Pardo, Marta
Symbolic regression automates the process of learning closed-form mathematical models from data. Standard approaches to symbolic regression, as well as newer deep learning approaches, rely on heuristic model selection criteria, heuristic regularization, and heuristic exploration of model space. Here, we discuss the probabilistic approach to symbolic regression, an alternative to such heuristic approaches with direct connections to information theory and statistical physics. We show how the probabilistic approach establishes model plausibility from basic considerations and explicit approximations, and how it provides guarantees of performance that heuristic approaches lack. We also discuss how the probabilistic approach compels us to consider model ensembles, as opposed to single models.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)